Approximate Nearest Neighbors Algorithm - a short version

نویسندگان

  • Peter W. Jones
  • Andrei Osipov
  • Vladimir Rokhlin
چکیده

dimensional Euclidean space. Given N points {xj} in Rd, the algorithm attempts to find k nearest neighbors for each of xj , where k is a user-specified integer parameter. The algorithm is iterative, and its CPU time requirements are proportional to T ·N ·(d ·(log d)+ k · (d + log k) · (log N)) + N · k2 · (d + log k), with T the number of iterations performed. The memory requirements of the procedure are of the order N · (d + k). A byproduct of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among {xj} for an arbitrary point x ∈ Rd. The cost of each such query is proportional to T ·(d · (log d) + log(N/k) · k · (d + log k)), and the memory requirements for the requisite data structure are of the order N · (d + k) + T · (d + N). The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme’s behavior for certain types of distributions of {xj}, and illustrate its performance via several numerical examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Randomized Approximate Nearest Neighbors Algorithm -a Short Version

dimensional Euclidean space. Given N points {xj} in Rd, the algorithm attempts to find k nearest neighbors for each of xj , where k is a user-specified integer parameter. The algorithm is iterative, and its CPU time requirements are proportional to T ·N ·(d ·(log d)+ k · (d + log k) · (log N)) + N · k2 · (d + log k), with T the number of iterations performed. The memory requirements of the proc...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

Quantitative Analysis of Nearest-Neighbors Search in High-Dimensional Sampling-Based Motion Planning

We quantitatively analyze the performance of exact and approximate nearest-neighbors algorithms on increasingly high-dimensional problems in the context of sampling-based motion planning. We study the impact of the dimension, number of samples, distance metrics, and sampling schemes on the efficiency and accuracy of nearest-neighbors algorithms. Efficiency measures computation time and accuracy...

متن کامل

EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph

Approximate nearest neighbor (ANN) search is a fundamental problem in many areas of data mining, machine learning and computer vision. The performance of traditional hierarchical structure (tree) based methods decreases as the dimensionality of data grows, while hashing based methods usually lack efficiency in practice. Recently, the graph based methods have drawn considerable attention. The ma...

متن کامل

Fast Large-Scale Approximate Graph Construction for NLP

Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011